-
Notifications
You must be signed in to change notification settings - Fork 3
Added symmetry-breaking init in rbf #42
base: main
Are you sure you want to change the base?
Conversation
Hi @bodin-e This could be handled through a simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to be merged, we’ll also need a test. This can be done by reusing the key
used in init_params
to draw a uniform noise vector and assert that initialised lengthscale matches this. There’s no need for a new test - you can simply adapt the existing tests that are failing.
@@ -64,8 +64,10 @@ def __call__( | |||
return K.squeeze() | |||
|
|||
def init_params(self, key: KeyArray) -> Dict: | |||
eps = 1e-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for 1e-3
. I’d personally like the jitter to be as small as possible.
Yes, I would expect it only being needed when using a combination kernel. It would be needed in all cases where any two (or more) kernels are identical, receive identical inputs and is depended on in an identical way. Which in practice probably would just be in the combination kernel case (at least I cannot think of another case now). However, it does not only matter for the lengthscale parameter. It would be sufficient to break the symmetry by using different initialization of any parameter in each kernel, such as for example the lengthscale, but all kernels used within a combination kernel may not have a lengthscale. And I think it could end up cumbersome to maintain a set of parameter names {"lengthscale", ...} to include at least one parameter name from each of the library kernels, although not impossible. I completely agree with the need for a nicer and more maintainable solution than the one in this PR though. The main purpose of this PR I had in mind was to highlight the issue (it would have been better to include an example within the issue ticket instead though). Yet, I'm thinking it may be the easiest to maintain and most readable to let the init_params of each kernel initialize itself sensibly, including a stochastic component to it; added to the parameter that it makes sense to add it to for the given kernel (and with a suitable jitter scale for that particular parameter). For example, in the RBF case we (and a user using the library) can easily read within the function scope that the default lengthscale is 1; and we know that a small jitter constant (say 1e-3 or 1e-4) shouldn't be semantically meaningful for the lengthscale in the RBF (in relation to value 1). In contrast, in the some kernels or setups some parameters can be much more sensitive to its value. If a user uses such as kernel they would easily be able to see how the particular kernel they use is initialised, and can themselves then override the init_params behaviour or use some other way to initialize it. What do you think of the following? Basically doing exactly what you purpose, which is to reuse initialization logic for the lengthscale (and other parameters when needed). But keeping this logic as separate functions and use them explicitly where needed, to keep high readability and reduce the risk of the user missing what is happening (and getting unexpected outcomes because of it). Specifically, keeping a file or module with parameter initialize functions somewhere, with functions such as something like: |
The init_params method currently initializes (for example) the rbf kernel deterministically. As such, when using a combination kernel of the same kernel types, the kernels are initialized to be identical. This is problematic as the symmetry between them forces them to stay the same using gradient-based local optimization, preventing e.g. a sum of multiple lengthscales to be learnt (if not breaking the symmetries in some other way).
This adds a random jitter to the RBF kernels lengthscales to break the symmetry. To resolve the same issue for the other kernels, they would need something similar.
The following PR makes the change to pass a unique key to each kernel within the combination kernel:
#41